UWICL: A Multi-Layered Parallel Image Computing Library for Single-Chip Multiprocessor-based Time-Critical Systems
نویسندگان
چکیده
any software libraries have been created to support the commonly used primitive operations needed in image processing, image analysis and image understanding. Generally, these libraries Mar e based on the single-layered Application Program Interface (API). While a single-layered API provides the useful abstraction level to interact with the library and hides unnecessary implementation details from the user, it does not produce an efficient program when a new algorithm is implemented by assembling the selected existing library routines. The composed program suffers from the inefficient data movement and additional loop control overhead. Furthermore, when a system employs a highly integrated processor such as a single-chip multiprocessor, the single-layered API prevents the user from fully utilizing the resources available in the system. In this article, we describe the University of Washington Image Computing Library (UWICL), the multi-layered high-performance parallel image computing library for Texas Instruments TMS320C80 Multimedia Video Processor (MVP)-based time-critical systems. Our goal in designing the UWICL is to provide the TMS320C80 user community with efficient and flexible image computing library routines. The UWICL provides three levels of APIs to the programmers under the multi-layered organization, the MVP-level API, the DSP-level API, and APIs for data flow and processing cores. By optimizing the processing core functions, we have achieved high performance in the individual function level, and by allowing the sub-primitive library routine composition, we can achieve efficient image processing application development, avoiding most problems encountered in using the single-layered library routines. The performance of the multi-layered organization vs. the single-layered one is analysed and compared using the Canny's edge detection algorithm as an example. The balanced composition based on the multi-layered organization outperforms the single-layered composition by 14 to 41% depending on the system's memory bandwidth available. As an adjunct to the UWICL, we have also developed an integrated MVP performance monitor (MPM). The MPM can identify the performance bottleneck of the TMS320C80 applications and can be used in optimization by enabling the user to select the most efficient library composition level in building the application with the UWICL. In order to provide the overall performance evaluation model of the MVP, the simple MVP functional model has also been defined in the MPM. For the image thresholding operation, the difference between the measured execution time and the analysis
منابع مشابه
Multi MicroBlaze System for Parallel Computing
Embedded systems need more computational power to satisfy today’s applications’ needs, like audio/video encoding/decoding, image processing, etc. An option for increasing the computational power of a system is to include various microprocessors and make them work in parallel. This paper presents a study of the viability of making a multiprocessor system on a chip (MPSoC) using the MicroBlaze so...
متن کاملMpfpga-lib: a Family of Soft Multiprocessor with Noc from 12 to 48 Processors
Design productivity is one the most important challenge facing future generation multiprocessor system on chip (MPSOC). The design productivity concerns hardware as well as software issues however sofwtare design productivity is more challenging especially for parallel software. The MPFPGA-LIB project aims at providing a family of soft IP multiprocessors executable on FPGA to help software deve...
متن کامل3D Network-on-Chip with on-chip DRAM: an empirical analysis for future Chip Multiprocessor
With the increasing number of on-chip components and the critical requirement for processing power, Chip Multiprocessor (CMP) has gained wide acceptance in both academia and industry during the last decade. However, the conventional bus-based onchip communication schemes suffer from very high communication delay and low scalability in large scale systems. Network-on-Chip (NoC) has been proposed...
متن کاملCritical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture
Processor-in-Memory (PIM) architectures are developed for highperformance computing by integrating processing units with memory blocks into a single chip to reduce the performance gap between the processor and the memory. The PIM architecture combines heterogeneous processors in a single system. These processors are characterized by their computation and memoryaccess capabilities. Therefore, a ...
متن کاملContention for Critical Sections Can Reduce Performance and Scalability by Causing Thread Serialization. the Proposed Accelerated Critical Sections Mechanism Reduces This Limitation. Acs Executes Critical Sections on the High-performance Core of an Asymmetric Chip Multiprocessor
......Extracting high performance from chip multiprocessors (CMPs) requires partitioning the application into threads that execute concurrently on multiple cores. Because threads cannot be allowed to update shared data concurrently, accesses to shared data are encapsulated inside critical sections. Only one thread executes a critical section at a given time; other threads wanting to execute the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Real-Time Imaging
دوره 2 شماره
صفحات -
تاریخ انتشار 1996